Deconvolution in white noise with a random blurring function 



Thomas Wilier 



Laboratoire de Probabilites et Modeles Aleatoires, Universite Paris VII (Denis Diderot), 175 
rue de Chevaleret, F-75013 Paris, France. 



Abstract 

We consider the problem of denoising a function observed after a convolution with a random 
filter independent of the noise and satisfying some mean smoothness condition depending on an ill 
posedness coefficient. We establish the minimax rates for the LP risk over balls of periodic Besov 
spaces with respect to the level of noise, and we provide an adaptive estimator achieving these 
rates up to log factors. Simulations were performed to highlight the effects of the ill posedness 
and of the distribution of the filter on the efficiency of the estimator. 

Keywords: Adaptive estimation; Deconvolution; Inverse problem; Minimax risk; Nonpara- 
metric estimation; Wavelet decomposition. 

1 Motivations and preliminaries 
1.1 Inverse problems in practice 

Deconvolution is a particularly important case in a more general setting of problems, known 
as inverse problems. They consist in recovering an unknown object / from an observation h n 
corresponding to H(f) corrupted by a white noise £, for some operator H. The model is of the 
kind: 

h n = H{f) + an- 1 ' 2 ^ Vn>l. (1) 

Inver se problems ap pear in many scientific domains. Several applications can be found for exam- 
ple in lOFTAl [l 999] in various domains such as meteorology, thermodynamics and mecanics. De- 



convo lution, in particular, is a common problem in signal and image processing (see Bertero and Boccaccil 



1998]). It appears notably in light detection and ranging devices, computing distances to an 
object by measuring the lapse of time between the emission of laser pulses and the detection of 
the pulses reflected by the object. In the underlying model / is a distance to an object measured 
up to small gaussian errors after being blurred by a convolution phenomenon due to the fact 
that the system response function of the device is longer than the time resolution interval of 
the detector. Several pape rs deal with this applicati on of deconvolution methods, for example 



Harsdorf and Reu ter [20QfJ| or I.Tohnstone et al 



In some cases, it is difficult to know a priori the underlying operator which transformed 
the object to be determined into the observed data. This problem appears notably when the 
operator is sensitive to even slight changes in the experimental conditions, or is affected by 
external random effects that cannot be controlled, and thus changes for every observation. In 
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Detector 



Figure 1 : Reconstruction of a density of activity 



these conditions, a framework with a random operator is more adapted than a setting with a 
fixed deterministic operator. 

As an example let us consider an in verse problem of reconstruction in a tomographic imagery 
system, borrowed from lOFTAl 11999]. The problem is to find the density of activity / of a 
radioactive tracer by collecting the 7 photons which it radiates on a detector. The framework is 
illustrated on Figure 1. The setting is such that only the photons transmitted perpendicularly 
to the detector are taken into account. A given pixel Ad of the detector collects a number of 
photons that depends on the density of activity / along some segment [-FA2], where F is the focal 
point towards which is headed. Each point M of this segment transmits a contribution f(M) 
towards A& but the pixel detects only a(M, Ad)f(M) photons from M because the radiation 
diminishes after it has gone across the fluid between M and A d . So the following quantity is 
observed on the pixel A d : 



X tl f(F,A d )= [ 



f(M)a(M,A d )dM, 



>Me[F,Adl 

and the function a can be put in the following form : 

a(M, Ad) = exp [ - / /i(M')dM'] , 

JM'e[M,A d ] 

where \x is a coefficient quantifying the radiation fading around M'. On figure 1 several zones 
characterized by different densities of activity and different coefficients fi are represented. If \x is 
constant along the segment [.FAd, then recovering / is a deconvolution problem. 

In practice the cartography of \x is not well known a priori. There is a different function 
for each pixel and this function depends on the characteristics of the fluid where the tracers 
were injected. Complementary measures and reconstruction algorithms are necessary to obtain 
it. In this context a probabilistic model is useful, where \i is a random function determined a 
posteriori thanks to additionnal measures. 



1.2 Estimation in inverse problems with random operators 

In the case of deterministic operators, inverse problems have been studied in many papers in a 
general framework where (£Q) holds with some linear operator H. Two main methods of estimation 
are generally used to recover / from the observation: singular value decomposition (SVD) and 
Galerkin projection methods. The former uses a decomposition of / on a basis of eigenfunctions 
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of H T H, which can be hard to perform if H is difficult to diagonalize. The latter uses a 
decomposition of / on a fixed basis adapted to the kind of functions to be estimated and then 
consists in solving a finite linear system to recover the coef ficients of f . Wavelet decompositio n 



is a very useful tool in such settings, see IDotioIiqI [l995] and Abramovich and Silverman 119981. 



Among others, a met hod combinin 
jections can be found in Cohen et al.l 



i g way elet-vaguelettes decompositions and Galerkin pro- 
[2002l |. whereas a sharp adaptive SVD estimator can be 
found in ICavalier and Tsvhakovl |2f)02l ] . Concerning the deconyolutio n problem, wavelet-based 
estimation techniques were developed i n Penskv and Vidakovici ll.Qflflll. IWalter and Shenl Hi 



estimation techniques were developed in iPenskv a nd Vidakovic 1 1999 11 . IWalter and Shenl |1999I | 
Fan and Kool |200^. iKalifa and Mallati |2003| and I.Tormstone et al.l|2004|. Multidimensional sit 



uations have al so been considered: minimax rates and estimation techniques can be found in 
Tsvhakovl |2nrnj. 



Generalisations of inverse problems to the case of random operators have been made in several 
recent papers. First, random operators enable to treat situations where, in practice, the operator 
modifying the object to be estimated is not exactly known because of errors of measure. In such 
settings, equation Q holds with an unknown deterministic operator H, and additionnal noisy 
observations provide a random operator H$ where 5 is a level of noise : H$ = H(f) + <5£. The 
problem is to build an estimator of / based on the data (h n , H$) achieving minimax rates. Several 
adaptive es timation methods have been dev eloped in this case. Some are based on SVD methods 
such as in ICavalier and Hengartnerl l2004ll. whereas estimato rs based on Ga l erkin projection 



methods were developed in Efromovich and Koltchins'kiil 200 ll ] or Cohen et al 



Random operators also appear quite naturally in models where the evolution of a random 
process is influenced by its past. For example let us consider the problem of estimating an 
unknown function / thanks to the observation of X n ruled by the following equation (called 
stochastic delay differential equation, SDDE in short): 



dX n (t) 



X n (t - s)f{s)ds)dt + an~ 1/2 dW{t) Vt > 0, 



X n (t) = F(t) Vt€[-r,0]. 

This problem is close to problem J2J : a convolution of the unknown function with the random 
filter X n is observed with small errors. However this filter is not independent from W so our 
results d o not apply to this particular p roblem. Numerous estimation results in SDDEs can be 



found in 



3 not apply to this particular p r 
Reissl |20r)4l | and in (Reissl |2nni| . 



with a different asymptotic framework. 



The organisation of the paper is as follows. Section 2, 3 and 4 present respectively the model, 
the estimator and the main results. Section 5 gives simulation results where the behaviour of 
the estimator is investigated for several distributions of the random filter, and section 6 gives the 
proofs of the theorems. 



2 The model 

We consider the following deconvolution problem. Let (£l,A,P) be a probability space and W 
a standard Wiener process on this space. For a given n G N* we observe the realizations of two 
processes X n and Y linked in the following way: 

(dX n (t) = f*Y(t)dt + an-^ 2 dW(t), Vie [0,1], 
\X n (0) =x , 
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where * denotes the convolution : / * Y(t) = L f(t — s)Y(s)ds, xq is a deterministic initial 
condition and a is a positive known constant. 

The problem is to estimate the 1-periodic function / when Y is independent of W and satisfies 
some condition of smoothness. 



2.1 The target function 

We introduce functional spaces especially useful to describe the target functions. For a given 
p > 1, let us first denote by L p the following space: 

L p ([0, 1])={/:R^M | /is I - periodic, and / \f\ p < oo}. 

Jo 

Secondly we use periodic Besov spaces which a re defined thanks to th e modulus of continuity in 



a similar way as in the non periodic case (see I.Tohnstone et al.l |2004l | for the exact definition 



They have the advantage of being very general, including spatially unsmooth functions, and of 
being very well suited to wavelet decompositions. Indeed, the following characterization holds 
under several co nditions on the wav elet basis similar to the conditions in the general case (which 



can be found in Hardle et al. 11 



B s p>q ([0,l]) = {f Elf ([0,1]) | \\f\\s, P , q :=Q2 2Ks+1/2 ~ 1/p)q ( E \M P ) q/p ) 1/q <°°i- 

j<0 0<k<V 



We investigate the maximal error when / can be any function in a ball of a periodic Besov 
space -BpgQO, 1]) of radius R and when the estimation error is measured by the L p -loss. We 
suppose that s > | so that / is continuous an hence its L p -norm exists. 

Definition 1. For given R > 0, p > 1, q > 1 and s > ^, define : 

M(s,p,q,R) = {f €fl',([0,l]) | \\f\\s, P , q <R}- 

Our aim is to determine the rate of the following minimax risk for p > 1: 

i^^inf sup Ef(\\f n - f\\ p ), 

/„ feM(s,p,q,R) 

where the infimum is taken over all a((X n (t), Y(t)) t& t ^))— measurable estimators f n . 
2.2 The filter 

We assume that the blurring function Y is a random process independent of n, f, and (in 
probabilistic terms) of the process W, and taking its values in L 2 ([0, 1]). 

Throughout this paper, we will use the following notations for two functions A and B de- 
pending on parameters p : 

• A< B means that there exists a positive constant C such that for all p, A(p) < CB{p), 

• A> B means that B < A, 

• A-^B means that A < B and A > B. 
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For j G N we introduce two random variables Lj and Uj (whenever they exist) linked to 
the smoothness of the process Y: 

= ^= 2J . 1 ll , and Uf = ^ l =° , 
where (Yi)i e z are the Fourier coefficients of (Y(t))t^[o,i] ■ 

To establish the lower (resp upper) bound of the minimax risk, we impose the following 
control on the distribution of Lj (resp Uj), which implies that the Fourier coefficients are not 
too large (resp small): 

Ci ow : There exists a constant v > such that, for all j £ N: 

E(Lj)<2- 2 ^. 

C U p '■ VZ G Z, Yi ^= almost surely, and there exist z/>0, c > 0, a > such that, for all j E N : 

Vt>0, P(Uj > t2 2uj ) <e~ ct \ 

All those conditions are satisfied if the Fourier Transform Y of the process Y has the following 
form: (^(u;)! = ^^y /2 , where T is a positive random process with little probability of taking 
small or high values (for example bounded almost surely by deterministic constants). This 
case includes for example gamma probability distribution functions with some random scale 
parameter, which will be used further. On the contrary, condition C up does not hold for filters 

with realizations belonging to supersmooth functions, ie Y such that |^(u>)| = T(w) ^ +w 2^/ 2 , 
for some constants B,(3 > and with T as before. Results on deconvolution of supersmooth 
functions can be found in Butuceal E? 



3 Adaptive estimators 

We first build an adaptive estimator, nearl y achieving the minima x rates exposed in the next 
section, which is close to the one developed in l.Tohnstone et all |2flf)4 | in the case of a deterministic 
filter Y. The method combines elements of the SVD methods (deconvolution thanks to the 
Fourier basis) and of the projection methods (decomposition on a wavelet basis adapted to the 
target functions). 

Let us set Rj = {0, . . . , 2 3 — 1) fo r all j £ N and let ^j,k)j,keZ denote the periodized 

Meyer wavelet basis (see iMeverl |l 99nl ] or lMallat] |l998] for details). For convenience the following 



notations will be used further: i? i = {0} and ^-l^ = ^0,0- Any 1-periodic target function / 
belonging to M(s,p, q, S) has an expansion of the kind: 

/ = E 

j>-l, k£Rj 

where 



o 
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We estimate / by estimating its wavelet coefficients. Let (ei(t)) = (exp(27riZt))/ g ^ denote the 
Fourier basis, and let (^j,k,l)leZj (fl)lez and {Yl)l& be the Fourier coefficients of the functions 
9j,k, f and Y. Set also: Wi = ft <n(t)dW(t) and X? = J* ei(t)dX n (t). Then by Plancherel's 
identity we have: 

Moreover f Q (f *Y)ei = fiYi, so equation (J2J yields: 

and thus if we suppose that Yj / almost surely for all Z, // can naturally be estimated by -^f- 
and we set: 

Then a hard thresholding estimator is built with the following values for the thresholds Xj and 
the highest resolution level j\ : 

2* = {n/(\ogn) 1+ ^} 1 ^ 1+2v \ 

where r\ is a positive constant larger than a threshold (which is determined in section 6). 

Finally the following estimator achieves the minimax rates up to log factors when the filter 
satisfies condition C up : 

fn= £ ^ J {l4 fc |>A;}%,*> ( 3 ) 

(j',fc)eA„ 

where A n = {(j, fc) G Z 2 | j e {-1, . . . , ji}, fc G Rj}. 

Moreover we also introduce a slightly different estimator with random thresholds instead 
of deterministic ones (hence the superscript R instead of D), ie with j\ and Xj replaced by ji 
and Tj: 

2 j2 ={n/logn} 1 ^ 1+2u \ 
Tj = rfyjuj log n/n, 

where rf is a large enough constant. The theoretical performances of f** will be studied in a 
separate publication, here only a simulation study is provided. 

4 Main results 

Let p > 1, R > 0, p > 1, q > 1 and s > 1/p. We distinguish three cases for the regularity 
parameters characterizing the target functions according to the sign of e = 2s+2 ^ +l — 2u + l 

the sparse case (e < 0), the critical case (e = 0) and the regular case (e > 0). 

Let us introduce the two following rates: 

,1 N s , , ,log(n) N s-i/p+i/p 

r n (s,u) = (_)a.+2H-i s n (s,p,p,u) = (-^) . 
n n ' 
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Theorem 1. Under condition Ci ow on Y : 

r n (s, v)~ l R n > 1 in the regular case, 

s n (s,p, p,u)~ 1 R n > 1 in the sparse and critical cases. 

Theorem 2. Under condition C up on Y: 

r n (s,u)~ 1 R n < 1 in the regular case, 

s n (s,p, p,u)~ l R n < 1 in the sparse case, 

s n {s,p, p^v)^ 1 R n < log(n)*- 1 in the critical case. 

Theorem 3. Under condition C up on Y, for estimator defined in Q and if q < p in the 
critical case: 

sup E f (\\ff - f\\ p ) < 

feM(s,p,q,R) 

sup Ef(\\ff - f\\ p ) < 
feM(s,p,q,R) 



,log(n) 1+ c. s 

-J 2s + 2v + 1 in the regular case, 



n 



( ) 2s+2!/+i-2/ P ^ n ^ e cr iti ca i an d sparse cases. 

v n ' 



When the filter satisfies C[ ow and C up the rates of Theorems 1 and 2 match except in the 
critical case when p > |, where the upper bound contains an extra logarithmic fact or. This is also 



observ ed in density estimation or regression problems fsee lDonoho et al.l |l99fil | and lDonoho et al 
and that factor is probably part of the actual rate of R n : the lower bound is maybe too 
optimistic. 

Analysing the effect of v, we remark that the rates are similar to the ones established in the 
w hite noise model or other classical non-parametric estimation problems (examples can be found 
in 



Tsybakov [2 



except that here an additional effect reflected by u slows the minimax speed. 
Indeed the convolution blurs the observations, making the estimation all the more difficult as v 
is large. This parameter is called ill-posed ness coefficient, explanations about this notion can be 
found in lNnsshaum and Pereverzevl [1999l | for example. 

Concerning Theorem 3, we remark that estimator is not optimal first by a log factor in 
th e regular case, w hich is a common phenomenon for adaptive estimators as was highlighted 



Tsybakov [2000], and secondly by log factors with exponents proportional to -, This is due 



m 

to the difficulty to control the deviation probability of the estimated wavelet coefficients when 
the probability of having small eigenvalues Yj of the convolution operator is high (ie when a is 
small) . 

The main interest of these results is that bounds of the minimax risk are established in a 
random operator setting, for a wide scale of L p losses, and over general functional spaces which 
include unsmooth functions. As far as we know, the lower bound has not been established in 
deconvolution problems for such settings even in the case of deterministic filters. 

Let us also note that condition C up imposed on the filter Y is similar to the conditions 
generally used in other inverse problems where the singular values of the operator are required 
to decrease polynomially fast. Moreover condition C up concern means of eigenvalues over diadic 
blocs, which enables to include filters for which Four ier coefficients vary e r ratica lly individually, 
but not in mean, such as some boxcar filters (see Kerkvacharian et al. |2flfl4j |). The case of 



severely ill-posed in verse problems , wher e the singular values decrease exponentially fast, has 
also been studied in Cavalier et al. 2003| for example. 
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Figure 2: Target functions 

5 Simulations 

To illustrate the rates obtained for the upper bound, the behaviours of estimators /® and are 
examined in practice for the following settings. We consider the four target functions (Blocks, 
Bumps, Heavisine, Doppler) represented on figure 2, which were used by Donoho and Johnstone 
in a series of papers I Donoho and Johnstone! |l994l | for example). These functions are blurred by 



convolution with realizations of a random filter Y and by adding gaussian noise with root signal 
to noise ratio (rsnr) of three levels: rsnr £ {3,5,7}. Then the two estimators are computed 
in each case and their performances are examined, judging by the mean square error (MSE). 
For the simulation of the data and the implementatio n of the estimators, pa rts of the WaveD 
software package written by Donoho and Raimondo for Johnstone et al. |2004 | were used. 



5.1 Distribution of the filter 

A simple way to represent the blurring effect is the convolution with a boxcar filter, ie at time t 
one observes the mean of the unknown function on an interval [t — a, t] with a random width a. 
However these kinds of filters have various degrees of ill posedness depending on a. For some num- 
bers called "badly approximable" numbers, this degree is constant and equal to 3/2. For other 
numbers the situation is more complicated, and the set of the badly approximable numbers has 
a Leb esg ue measure equal to zer o (more explanations can be found in Johnstone and Raimondol 



|2()()41j or I Johnstone et al. 120041). However new results have been found recently for almost 
all boxcar widths in iKerkvachaxian et a 1.1 |2004| where the near optimal properties of several 



thresholding estimators are established. 

So as to keep a fixed ill posedness coefficient boxcar filters are excluded, and one considers 
convolutions with periodized gamma functions with parameters v and A: 

Y (t) = — — — Y(t + iy- l e^ t+l \ 

Jo * e Ub «gn 

where v is a fixed shape parameter and A is a random scale parameter with a probability distri- 
bution function F a parametrized by some a > 0: 

F a (t) = min(l,2e"^I(t > 0)), 
where the constant C a is set such that E{\) = 150 for all a. 
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Figure 3: Examples of filters, from left to right: {v, A) G {(3, 150), (3, 50), (10, 150), (10, 50)} 



Such a filter Y satisfies conditions C up and C[ ow . Some examples of its shapes are given in 
figure 3: v and A can be interpreted respectively as a delay and a spreading parameter. According 
to the minimax rates, / should be (asymptotically) more difficult to estimate for large v and for 
small a. This is checked in practice in the next section. 

5.2 Results 

First we focus on the effect of v conditionally to the filter Y . An example in medium noise for 
the Blocks target is given in figure 4, where the filter is kept constant with A = 150: as expected, 
both estimators get less and less efficient when v increases. Moreover in practice the thresholds 
of estimator need to be rescaled for each u, contrarily to those of estimator which is 
thus more convenient. The same results were obtained for the other target functions and by 
examining the MSE of the estimators, the figures were not included for the sake of conciseness. 

Next we set v = 1 and we investigate the effect of the distribution of the filter Y. Both 
estimators perform well for mean and high realizations of A, but difficulties appear for small 
realizations which are all the more frequent as a is small: the worst case among 10 simulations is 
represented in figure 5 when a = 2 and in figure 6 when a = 0.5, and the two estimators perform 
more poorly in the last case. However they remain better in that case than a fixed threshold 
estimator (ie with thresholds completely independent of the filter) also represented in the figures. 

More generally the MSE were computed for several values of a and for the three noise levels. 
The results are given in figure 7: the shape of the distribution of Y clearly affects estimator f®, 
and also to a much lesser extent. The smaller a, the poorer they behave. Especially the 
Doppler and Bumps targets are not well estimated by for small a, mainly because the high 
thresholds make it ignore many of the numerous details of these targets. 

Finally estimator proves more convenient than estimator when the ill-posedness varies, 
and also less sensitive to the weight of the probability of small eigenvalues. 
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Figure 6: Data, estimator f„, estimator /„ and a fixed-threshold estimator (left to right) for a = 0.5 
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Figure 7: Effect of a on the MSE of estimator (left) and estimator (right) for each target, each level of 
noise and a £ {0.5, 0.6, 0.7, 0.8, 0.9, 1} (left to right in each group) 
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6 Proofs of the lower and upper bounds 

6.1 Lower bound 
6.1.1 Sparse case 



We u se a classical lemma on lower bounds (Korostelev and Tsybakov iKorostelev and Tsvhakov 
|l 99,^ 1: 



dP Af) 



Lemma 1. Let V a functionnal space, d(., .) a distance on V , 

for f, g belonging to V denote by A n (f,g) the likelihood ratio : A n (f,g) = .p*" where dP ' (h) 

y(s) -*n 

is the probability distribution of the process X n if h is true. 
If V contains functions /o, /i, . . . , fx such that : 

• d(f k ,,f k ) > 5>0fork^k', 

• K > exp(A„) for some X n > 0, 

• A n (/o,/fc) = exp(z^ — v*), where is a random variable such that there exists ttq > with 
P(Zn > 0) > 7To, and are constants, 

Then 

sup P x(f) {d(f n ,f)> 8/2) >ir /2, 
fev n 

for an arbitrary estimator f n . 

To use this result, we build a finite set of functions belonging to M(s,p, q, R) as follows. Let 
(Vy fc)i>-l,fcez be an s— regular Meyer wavelet basis, which we periodize according to: 

^j,k(x) = ^2 ^jA X + 0- 

In the sequel we denote by (^j,k)(j,k)eA the periodized Meyer wavelet basis obtained this way, 
where A = {(j, k)\j> -1; k G Rj} and Rj = {0, . . . , 2 j - 1}. 

Now for a fixed level of resolution j set for any k G Rj\ 

fj,k = 7%fci 

with 7 < 2-J^+V2-1/p) suc h that H/j.fclU^g < R. Set also / = 0. 

Let us choose for d the distance d(f,g) = \\f — g\\ p . Because of the relation between the L p 
norm of a linear combination of wavelets of fixed resolution j and the l p norm of the corresponding 
coefficients (see Meveil [liffl fl]). we have for any k,k G Rj, k ^ k : 



In this framework we have : K = 2 J and 5 X 72 J, ( 1 / 2_1 / p ^. So as to apply the lemma, we have to 
find parameters j(n) and j(n) such that the other hypotheses of the lemma are satisfied, which 
will be true if : 

P fjk (ln(A n (/ , f jik )) > -j(n) ln(2)) > ttq > 0, 
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uniformly for all fj )k . Moreover we have : 

P f . k (ln(A„,(/ , f jth )) > -j(n) ln(2)) > 1 - P f (\ ln(A n (/ , f hk ))\ > j(n) ln(2) 



> 1 - E f . h [\ln(A n (fo, hk)\) Mn) ln(2)). 

So the previous condition is satisfied when 7(71) and j(n) are chosen such that, with a constant 
< c < 1: 

E f]k (\ln(A n (fo,fj,k))\) < cj(n) ln(2). (4) 

Consider two hypotheses /o and fj >k , and let us determine the likelihood ratio of the corre- 
sponding distributions of the observations (X n (t), Y(t))t£[o,i] ■ Let F be a bounded measurable 
function. Since Y is assumed to be independent of W and free with respect to / in 0, we have: 

E f . h [F{X n , Y)] = E [E{F{( f f hk * Y{s)ds + an- l ' 2 W(t), Y(t)) te[0A] ) \Y}] 

J 

= J E{F{an- 1 l 2 W,y)}dP Y {y), 

where Py denotes the distribution of Y and W(t) = W(t) + J* o~ l n 1 / 2 fj :k -ky{s)ds. 

For a given function y let h y - k be defined by: h v - k {t) = a~ l n 1 / 2 fj^*y(t). We assumed that 
Y takes its values in L 2 ([0, 1]) so for each of its realization there exists a constant C y such that 
for all t E [0,1], f Q {h y jk ) 2 {s)ds < C y and we can apply the formula of Girsanov: the process W 
is a Wiener process under the probability Q defined by 

dQ = exp [ - J hl k {t)dW(t) - l - j\hl k (t)) 2 dt]dP 

Thus for any function y: 

E P [F{an- 1 l 2 W,y)} = E Q [F[an^l 2 W ,y) exp h v j>k (t)dW(t) + \ j\h v jtk (t))*dt]] 

= EqlFian-WWiv) exp h%(t)dW(t) - \ f\h y hk (t)) 2 dt\] 

= EpiFian-WWtv) exp [£ h^W® - \ j\h%(t)) 3 dt]]. 



So finally: 



A»(/o, = exp - / -1/2 + o / ( -1/2 ) dt • 

Jo 0""- 1 2 Jo an 1 



We can now examine under which conditions (@J is true. We have: 

E\ln(A n (fo,kk))\= E \ I / *j,k*Y(t)dW{t)-?-z / (* jtk *Y(t)) 2 dt\ < A n + B n , with: 

G Jo Za Jo 

B n = p^E{ I (V jik *Y(t)) 2 dt), 
1(7 Jo 

A n = ^—E\ [\ J<k *Y(t)dW(t)\ < ^—{E( f\ hk *Y{t)dW{t)) 2 ) 1/2 < {2B n ) l l 2 , 
a Jo a Jo 
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where we used Jensen's inequality for A n . 

Let us find a bound for B n . We introduce the Fourier coefficients of Y and ^jk denoted by 
Y\ and ^?j,k,i for all I G Z. Since the Fourier Transform of ^j : k is bounded by 2~i' 2 we have: 

where Cj is the set of integers where the coefficients ^j,k,l are not equal to zero (it can easily be 
shown that this set does not depend on fc). 

The support of the Fourier transform of the Meyer wavelet is included in [— — ^juf 2 ^, ]. 
So = as soon as |2vr2~^| G [^-,^-] c , and Cj C [-7/ i+1 ,-2/>- 2 \ U [2 j ~ 2 , 2 j+1 ] for all j. 

Then under condition Ci ow and noticing that Y"_; = Y\ we obtain: 

Finally, condition (@J) holds if we choose 7 and j such that: 

7 2 n2" 2 ^ < j, and 7 < 2^( s+1 / 2 - 1 /p) . 
We choose the following values that satisfy those two conditions: 

7 x 2 -^+V2-iM and 2 i x ( n // 05 ( n )) 1 /(2 S +2^+i-2/p)_ 

Finally, using the lemma and the inequality of Markov, for a((X n (t), Y(t)), t G [0, 1])— measurable 
estimators f n the following bound holds: 

inf sup E f (\\f n - f\\ p ) > 7 2^ 1 /2-i/p) ~ ^^y^lilTX . 

/„ feM(s,p,q,S) n 

6.1.2 Regular case 

Here we consider another set of functions belonging to M(s,p, q, R). We use the periodized 
Meyer wavelet basis like before. But now we set for any e G {— 1,+1} Rj : 



with 7 < 2^'( s+1 / 2 ) such that \\fj,e\\ s , P , q < S. We also set I jJk = \£, 



L 2J ' 2J 



We use an adaptation of lemma 10.2 in lHardle et al.l |1998l | to the case of Meyer wavelets 
(that do not have compact supports) and of the norm ||.|| p : 

Lemma 2. Suppose the likelihood ratio satisfies for some constant A: 

P / . e (A n (4 £fc) / ije )>e- A ) >p* >0, 

uniformly for all fj >£ and all k G Rj, where e k is equal to e except for the k th element which is 
multiplied by —1. Then the following bound holds: 

max E fje (\\fn ~ fj,e\\ P ) > C2*/ 2 7 e~ A P., 
where C is positive and depends only on p. 
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Similarly to the sparse case, the hypothesis of this lemma is satisfied if, for a small enough 
constant c: 

E f .J]n(A n {f j , e k,f j , e ))\<c. 
Now the log-likelihood is equal to: 

ln(A n (4 efc ,/,, e )) = ^! (\ jk * Y{t)dW{t) _ 2 _f^ f\* jtk icY{t)] 2 dt. 

a Jo a Jo 

Like before, we only need to dominate the following quantity: 

Bn = l 2 nE f . e ([\v j ,k*Y(t)fdt). 
Jo 

We use the same bound as in the sparse case, under assumption Ci ow . The parameters have to 
be chosen such that: 

7 2 n2- 2 ^<l and 7 < 2'^ s+1 / 2 \ 
Finally the regular rate is obtained for the following choices: 

7x2 - j ( S +l/2) ) and 2jxn l/(2 S+ 2,+l)_ 



Proof, of the lemma 



The Meyer wavelet satisfies 3A > such that IVK^)! < i+\ x \' 2 • Consequently: 
\^j,k\^)^y ) — * ' r \ I i / y ~r ^~ <>)\- ">■•!<, 

„i „i 

a/p 



l/p = 2 j(i- 


J> ( 








i> ( 






> 2 j( ^~ 


i> ( 


> c2 i{ ^ 


4> 



>^^(fi*)r*-^E^)"' 



for j large enough and c > depends only on p. 

Then using a concavity inequality and similar arguments as in the compact support case, we 
have: 



msxE f . t (\\f n ~ he\\p) > 2 2 E^>, E / \fn ~ he\ P f P 

e k=0 7 J. fc 

e k=0 Jl 3,k 

k=0 e\e k =l T 3,k Jl j,k 
23 — 1 

'"^ E E / " ^ + M/*,^ / l/n " /^| P > ^}] 

fc=0 e|e fc =l ^ ^ 



> 2- 2J+j( p 
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with 5 = C72 i(2 p\ 
Noticing that 



( / |/n - fj,e\ P ) 1/P +([ \fn- f^f" > H [ l*^)^ > 2 7 c2^4) 

•> Ij,k Ij,k Ij,k 

for j large enough, the end of the proof follows as in Hardle et al. 19981 ]. □ 



6.2 Upper bounds 

6.2.1 Properties of the estimated wavelet coefficients 

The performances of the thresholding estimators rest on the properties of the estimated wavelet 
coefficients $j tk . In the sequel we will also need properties for the estimators ctj >k defined the 
same way as 0j ik in estimator (jjjj except with $ instead of We have the following results: 

Proposition 1. Under condition C up we have for all j > —1, k G Rj and r > 0, 

E{\f3 hk -(3 hk \ r )<{^) r and E(\a jik - a jt k \ r ) <(^Y , 



and there exist positive constants k, and k' such that for all A > 1, 



P(0j,k ~ Pj,k\ > —J=^) < 2~ kA ^ and P(\0 j>k - j>k \ > \ < 2" K ' A2 

\/n V n 



where the constants in the inequalities do not depend on j, k and A. 



Proof, of Proposition 1 

Remark that conditionally to the process Y, ((3j, k — f3j t k) is a centered gaussian variable with 
variance: 

VarQfa-ftM \Y) = E[^-Y,\^^i? \Y]. 

Since the Fourier transform of the Meyer wavelet is bounded by 2 _J//2 and only 

I G [-(2 j+1 - 1), -2 j ~ 2 ] U [2 j - 2 , 2 j+1 - 1] has to be considered, we have for some constant C > 0: 

Var(\0 j7k -p jtk \ \Y)<CUj/n. 

Thus the moment of order r of ($j >k — Pj, k ) is bounded by 

E(0 j>k - (5 hk \ r ) < E[{Var(\P hk - j>k \ \Y)) r / 2 ] < E[{Uj /n) r /% 

and by similar arguments the same bound holds for (ctj lk — because the support of the 
Fourier Transform of cf)j ik is ^[— 2 J , 2 J ], 
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For the deviation probability we use a probabilistic inequality for a centered standard gaussian 
variable Z. Conditionally to Y we have: 

P(\Pj, k -Pj, k \>^ \Y)<P(\Z\>X^/(CUf) \Y) 

1 A 2 2 2 ^' 
X^]2^o/{CUj) 2CU j 

Then we take the expectation over Y, by Cauchy Schwartz we obtain for A > 1: 



2 UJ 

P(\(3 jtk -(3 jtk \>—\)< 



JjY \2o2vj 



\fn \ ^i' v CUj 



The end of the proof is directly deducible from the lemma below, and the last part of Propo- 
sition 1 is easily proved by replacing 2"^ by ^Uj in the three inequalities above. 

Lemma 3. Let Xj be the following random variable: Xj = ^£73 ■ For all j > there exists 
positive constants C , C" , C(.) such that for all r > 0: 

E{e~^)<C'e- c " r7 ^ 1 , and E(X?)<C(r). 

Proof, of the lemma 
For all r > we have: 

E(e~^) = [ P(e~^ > u)du 
Jo 

p+00 

= r / P(Xj > l/u)e-' ru du 
Jo 

<r f P(Xj > l/u)e- ru du + e~ r 
Jo 

<r f e- ru - c/ua du + e- r , 
Jo 

and one can check that there exists C" > such that J Q e~ ru ~ c / ua du < e~~ c ra+ . 
The second part of the lemma is easily proved by using similar arguments. 

□ 
□ 
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6.2.2 Proof of the sharp rates 

In the regular and critical zones, estimator (0) is not optimal up to a logarithmic factor. In 
order to show that the rates of Theorem 1 are sharp, we exhibit estimators achieving the rates 
of Theorem 2. Those are not as interesting in practice as (jSJ, since they depend on caracteristics 
of /, ie they are not adaptive. 

We will use the following bound to estimate the risks, which holds for any —1 < j m < ju < °o 
and any set of random or deterministic coefficients /3j & such that the quantities below are finite: 

E \\ E £ s E * C >-"\J2 E \M p f' ■ (5) 

jm<j<jM fcG-Rj jm<j<3M keRj 

The proof is immediat e by M inkowski inequality, the fact that || Y^k^R ^j,k^j,k\\p x 2 p \\f3j r \\i p 



(established in iMeven |l99Cl |) and a concavity argument. 



Let us denote: u' = u + 1/2 and e = ps — v\p — p). We distinguish two cases: p < p and 
p < p. In the first case M(s,p, q, R) is included in the regular zone. By concavity we have: 

inf sup E f \\f n - f\\ p < inf sup E f \\f n -f\\ p . 

fn f&M(s ;P; q,R) fn f£M{s,p,q,R) 

So seeing the expected rate only the case p = p needs to be considered. We take the following 
linear estimator: 

fn = E & h,k®h,k- 
keR n 

For any / £ M(s,p,q, R) the risk is composed of a bias error and a stochastic error: 

E f \\f n - f\\ p <A s + A s , 

with: 

2"" .. h 2 u ' jl 



A S = E\\ Y.^-*n^nAv<l n(h *\ E E \ & h,k ~ <x ju k\ p ]' <{-^)2% 



a = ii E E fa* A s E 2^-? } (E wr p z E m-fo-^-b < 2-*-, 

j>ji keRj j>ji fc< />',. j>ji 

and we obtain the rate by choosing j\ = [35x5^] • 

In the second case (p < p) we consider the following estimator: 

fn= E %L+l,k*ii+l,* + E E ^>*^{|&*|>Aj}*i>*> 
fcG-Rj 1+ i ji<j<}2k&Rj 

where: 



and 7/ > 2(^)5, so that we have by Proposition 1: P(|/3, )fc - Pj,h\ > Aj) ^ 2~ K ' , ' 2 (i-ii). 



18 



We proceed as in lDonoho et al. by distinguishing six terms: 



fn~ f = £ ( & n,k ~ a h,k)®j,k + £ Y @h k ^h k 

keRj j>j2 keRj 

+ Y Y (&> k - ^.fe)%fc[ J {|4 fc |>A J -,|ft, fe |<A j /2} + / {]^,Al>Ai > |/3 J -, fc |>A 3 /2}] 
ji<j<h keRj 

+ Y Y ^ fc %.fe[ I {]^,A|<A^[^,fc|>2A J -} + I {]^,A|<A,-,[ft >fc |<2A 3 -}] 
h<i<h keRj 

= e s + e b + e bs + e bb + e sb + e ss . 
Like before the stochastic error is bounded by: 



*(IMW< ^, 

and by using Sobolev embeddings it is easy to see that: 

^(l|e 6 || P )<2-^-^ ) . 

The terms e& s and e s b can be grouped together because of the two following assertions: 
{0 jik \ < Xj, \/3 jtk \ > 2Xj} U {|4fc| > Xjj \Pj,k\ < Aj/2} C {|4* - > Aj/2}, and 
WjA < Aj, fc| > 2Aj] =>- < 2|4-,jfc - /3j,fe|]- Consequently: 



£(II«Up + ||e*||p) < £ £ 14* " ^l' 7 {l4 fe -/3,, fc |>A,/2}) 



Ji<3<32 k€Rj 

< £ 2 J ^-^(E(^|4 fc -/3 J - fc |^)^(P{|4 fc -4 fc |>A J /2})l)^ 
h<o<h keRj 

* E ^>(E^ 

h<j<h keR 3 

< — — \ 2 l 2p w 



< 



r E 2< 



n2 0<j<j 2 -ji 



r^j 1 1 



where we used Cauchy Schwartz inequality and Proposition 1. 
For ebb we use the characterization of Besov spaces: 

E(\\e bb \\ P ) < £ 2*<H>( £ £|4 fc - 



v / j —\i-j,k rj,K t ~\\Pj±\^j I *S > P 

h<i<h keRj 



ji<j<32 keR 3 " ' A ' 2 



< 



£ ( ,,^ -^-^(11/11^)^ 

" 2 [j -nr 



jl<3<32 " ' ' ' " 

n 2p h<j<h W JiJ 2 
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Lastly for e ss we remark that \f3j t k\ p < (2Aj) p p \Pj,k\ p and we use again the characterization 
of Besov spaces: 

E(\\e ss \\ P )< E 2^-^((2A,-r P E W)* 

£ E ( s (i-ii)^(ll/li;,oo) p )^ 

h<i<h n 2 

<4f E (2-^0- -il)^)" 

" 2p ii<i<j 2 

According to these bounds e& s , e&b and e s are of the same order and e ss dominates ebb, so we 
choose ji and ji so as to balance the bounds of e^, e s and e ss . 
In the regular zone we have: 

E(\\ess\\ P ) < 

n 2 

and in the sparse zone: 

n 2 

Thus with the announced choices of j\ and j2 we get the prescribed rates in both zones. 
Lastly in the critical zone we change the majoration of /.) in and e ss by using: 

h<3<32 keRj P 

< (ll/ll^) 9 if -><?• 

p 

Here again e ss is dominant and of the order: £?(||e ss ||p) < 2 p j 2 piJ , hence the extra 
logarithmic factor. 



6.2.3 Proof of the rates of the adaptive estimator 

To prove Theorem 3 we use a theorem for thresholding a l gorith ms established by Kerkyacharian 
and Picard (Theorem 3.1 in Kerkvacharian and Picardl |2000l |) which holds in a very general 
setting where one wants to estimate an unknown function / thanks to observations in a sequence 
of statistical models (E n ) ne ^. It uses the Temlyakov inequalities, let us first recall this notion. 

Definition 2. Let e n be a basis in L p . It satisfies the Temlyakov property if there are absolute 
constants c and C such that for all A E N: 

<X / \zn(x)\ p dx< AX/ M*)| 2 r /2 <^<CE f \e n {x)\ p dx. 

Now let (ipj k)j,k denote a periodized wavelet basis and let p > 1 and < r < p. Assume that 
there exist a positive value 5 > 0, a positive sequence (o"j)j>-i, a positive sequence c n tending 
to 0, and a subset A n of N 2 such that : 

|A„| ~ c~ s where \S\ denotes the cardinal of the set S, (6) 
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(crjil>j,k)j,k satisfies the Temlyakov property, (7) 
sup[//{A„K] < oo, (8) 

n 

where \i is the following measure on N 2 : 

^k) = \\a^ k Y p = 2^l^a]\\W P - 

Assume also that we have a statistical procedure yielding estimators f3j^ of the wavelet 
coefficients fy^ of / in the basis {ipj,k)j,k and a positive value rj > such that for all (J, k) £ A n : 

Edfo-P^KCicnaj) 2 ', (9) 
P(\Pj,k ~ Pj,k\ > WjCn/2) < Cmin(c^, <£). (10) 

Finally let Z r ,oo(/-0 and A(c^~ r ) be the following spaces and let f n be the following estimator: 
^,oo(m) = {/> sup[A 9 /Lt{(j,fc)/|/3 ijfc | > (TjA}] < oo}, 

A>0 

A(cr) = if, c-(^)||/ - ^ < oo}, 



i,fceAn 

Theorem 4. Using the objects defined above and under the hypotheses JHJ to ({TO]) , we /iai>e i/ie 
following equivalence: 



E\\L-f\\ p <c p - r <=> /el r ,ooWni(c: 



n y 



We adapt this to estimator by setting, for given p > 1, p > 1, s > 1/p and q > 1: 
cn = V log(n 2^ , ^ = 2^, 2^{-V}^, A n = {(j,fc)| -l<j<ji, fe€i2,-}. 

V log(n) a 

With these choices we have: 

I A I - Oil - -2/(l+2i/) 
\ ll n\ ^ ^ ^ °n j 

ii-i 

^(A^) = ^ 2 - J 2'' (p/2 ~ 1) 2 pi/J ' x 2-' 1 ^ +1/2) . 

3=0 

Consequently © and ( El) hold with <5 = 2/(1 + 2u). Condition Q is also satisfied, the proof 



can be found in l.Tohnstone et alJ |2f)04l ]. Moreover thanks to Proposition 1, it is easy to establish 
that the estimators (3j : k used by © satisfy (0 and lfTU)l as soon as rj > 2( max ^ ,p ^ )~2S~. 

Then we prove Theorem 3 by setting r such that the right hand side of the inequality in the 
first point of the theorem corresponds to the rates in the sparse and in the regular case, ie: 

8 ~ 1/p + 1/p 

r = p — 2p- 



2s + 2u + l- 2jp 
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or 

r = p — 2p 



2s + 2v + V 

and by showing that the space over which the risk is maximized is included in the maxiset, if we 

2v+l _ 2u+ 
P V ' 

M(s,p,q,R) C l r ,oo(fi)ni«- r ) 



add the condition q < p in the critical case 2 "" kJ '" : 1 ^1 



The inclusion M(s,p,q,R) C A(c f ^') is established in Johnstone et al.l l2004ll. and the fol- 
lowing proof of M(s,p,q,R) C Z r ,oo(/u) uses the same arguments as iKerkvacharian et al. 2004 ] 
for the boxcar blur. We have: 

p{(j, k) : |/3 i)fc | > 2 v i\] = Y, ^ p{v+m ' X) > 

j>0, keRj 

< Y(2 jpiv+1/2) ) a (aJ'^+vaj-i) J2(\Pj,k\/( 2l/jx )Y 

j k 
r >—j( s P+ L/ 'p—i''p) 

<B^ (i/+1/2) )A(- — ^ — 

j 

where i/ = ^ + 1/2 and €j £ l q . We cut the sum at J such that 2 J x A~ r// ^' p ^. 
In the regular case we have: 

\(sp-v'{p-p))-£- 
p{(j,k) : \0 j)k \ > 2 vj X} < \- r + , 

and the power of A in the second term is also exactly — r. 

In the critical case we obtain, since q < p: 

p{(j, k) : \fa k \ > 2 uj X} < X~ r + ^ < \- r + ^ < \- r + X-p, 
and r = p in this case. 

Lastly in the sparse case (where r > p is satisfied) we use the Sobolev embedding Bp q C B*„ 
with s' = s — 1/p + 1/r. We proceed as before by cutting the sum at J such that 2 J X X~ r ^ v ' p ^ 
and noticing that s'r + u'r — v 1 p = 0. There exists £j 6 Z r such that: 

fc) : |/3 ijfe | > 2*A} < A ( 2J( ^" 1} Ed^l/( 2 ^ A )) r 

3 k 

< X> J> ') A (|) 

i 

< A- r . 

Thus p{(j,k) : 2^ A} < 1/A r for both values of r, and finally using the equivalence in 
Theorem 4 and Jensen inequality we obtain the prescribed rates for E\\f® — f\\ p . 
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